feat(skills): support loading skills from GitHub/Git URLs#2091
feat(skills): support loading skills from GitHub/Git URLs#2091dgallitelli wants to merge 3 commits intostrands-agents:mainfrom
Conversation
Add support for remote Git repository URLs as skill sources in AgentSkills, enabling teams to share and reference skills directly from GitHub without manual cloning. Closes strands-agents#2090 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse GitHub web URLs like /tree/<ref>/path to extract the clone URL, branch, and subdirectory path. This enables loading skills from subdirectories within mono-repos. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
01a4fc3 to
83966f6
Compare
|
/strands review |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| _DEFAULT_CACHE_DIR = Path.home() / ".cache" / "strands" / "skills" |
There was a problem hiding this comment.
Issue: _DEFAULT_CACHE_DIR = Path.home() / ".cache" / "strands" / "skills" is evaluated at module import time. In containerized environments (Lambda, Docker) where HOME may not be set, Path.home() raises RuntimeError.
Suggestion: Evaluate this lazily, e.g., inside clone_skill_repo when cache_dir is None. You could use a sentinel or simply inline Path.home() / ".cache" / "strands" / "skills" inside the function:
cache_dir = cache_dir or Path.home() / ".cache" / "strands" / "skills"This also respects XDG_CACHE_HOME if you want to be a good citizen on Linux:
import os
cache_dir = cache_dir or Path(os.environ.get("XDG_CACHE_HOME", Path.home() / ".cache")) / "strands" / "skills"| _DEFAULT_CACHE_DIR = Path.home() / ".cache" / "strands" / "skills" | ||
|
|
||
| # Patterns that indicate a string is a URL rather than a local path | ||
| _URL_PREFIXES = ("https://", "http://", "git@", "ssh://") |
There was a problem hiding this comment.
Issue: http:// is included in URL prefixes, allowing skill repos to be cloned over plaintext HTTP. This exposes users to man-in-the-middle attacks where a malicious actor could inject arbitrary code into the cloned skill.
Suggestion: Remove "http://" from _URL_PREFIXES, or at minimum log a warning when an http:// URL is detected in clone_skill_repo. Git itself warns about this, but the SDK should guide users toward secure defaults (tenet: "the obvious path is the happy path").
| key = cache_key(url, ref) | ||
| target = cache_dir / key | ||
|
|
||
| if not target.exists(): |
There was a problem hiding this comment.
Issue: Race condition — if two threads/processes call clone_skill_repo concurrently with the same URL, both pass the not target.exists() check and attempt to clone into the same directory. One will fail or produce a corrupt state.
Suggestion: Use an atomic pattern: clone to a temporary directory first, then os.rename() (atomic on the same filesystem) to the target path. If the target already exists by the time rename happens, just delete the temp clone. Something like:
import tempfile
with tempfile.TemporaryDirectory(dir=cache_dir) as tmp:
tmp_target = Path(tmp) / "repo"
# ... clone into tmp_target ...
try:
tmp_target.rename(target)
except OSError:
# Another process won the race — that's fine, use their clone
pass| return hashlib.sha256(key_input.encode()).hexdigest()[:16] | ||
|
|
||
|
|
||
| def clone_skill_repo( |
There was a problem hiding this comment.
Issue: There is no mechanism to refresh a cached clone. When no ref is specified, the default branch is cloned. If the remote repo is updated, the user gets stale content indefinitely with no obvious way to refresh (other than manually deleting ~/.cache/strands/skills/).
Suggestion: Consider adding a force_refresh: bool = False parameter (or documenting the cache invalidation story clearly). At minimum, document in the docstring that users should delete the cache directory or pin a ref for reproducibility. This is especially important since the cache key is a hash — users can't easily find and delete the right directory.
| List of Skill instances loaded from the repository. | ||
|
|
||
| Raises: | ||
| RuntimeError: If the repository cannot be cloned or ``git`` is not |
There was a problem hiding this comment.
Issue: The Raises section documents RuntimeError but the method also raises ValueError (via is_url check on line 377). Additionally, calling Skill.from_url() with a non-URL raises ValueError, but using the same string through AgentSkills(skills=["./path"]) treats it as a local path. This inconsistency could confuse users.
Suggestion: Add ValueError to the Raises section in the docstring.
|
Issue: This PR introduces new public API surface ( Key questions an API reviewer should evaluate:
|
|
Issue: The Suggestion: Add |
| from ._url_loader import clone_skill_repo, is_url, parse_url_ref | ||
|
|
||
| if not is_url(url): | ||
| raise ValueError(f"url=<{url}> | not a valid remote URL") |
There was a problem hiding this comment.
Issue: The from_url ValueError uses an f-string in the exception message: f"url=<{url}> | not a valid remote URL". This is consistent with the structured logging style used elsewhere, but the AGENTS.md specifically says "Don't use f-strings in logging calls." Since this is an exception message (not a logging call), this is acceptable — but the logging calls throughout the module correctly use %s format, which is good.
However, the RuntimeError raised in clone_skill_repo at line 189 also uses an f-string: f"url=<{url}>, ref=<{ref}> | failed to clone...". This is fine since it's an exception message, not a log statement.
Review SummaryAssessment: Request Changes This is a well-structured feature that fills a real user need. The design decisions (subprocess git, shallow clone, hash-based caching) are sound and well-documented in the PR description. The test coverage is thorough with 32 new tests. Review Categories
The core implementation is solid — addressing the security and reliability items above would make this ready to merge. |
- Security: remove http:// support (MITM risk), only allow https/ssh/git@ - Reliability: atomic clone via tempdir + rename to prevent race conditions - Reliability: evaluate cache dir lazily (not at import time) for containers - Usability: respect XDG_CACHE_HOME for cache directory - Usability: add force_refresh parameter to re-clone stale unpinned refs - Docs: add ValueError to Skill.from_url() Raises section - Docs: add _url_loader.py to AGENTS.md directory tree - Tests: add coverage for XDG_CACHE_HOME, force_refresh, race condition, http:// rejection (184 tests total) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks for the thorough review! I've addressed all the feedback in the latest push: Security
Reliability
Usability
Docs
Tests
Re: API design (
|
Summary
Closes #2090
AgentSkillsandSkill.from_url()https://,git@, andssh://URLs with optional@refsuffix for branch/tag pinning (e.g.,https://github.com/org/skill@v1.0.0)--depth 1) and cached locally at~/.cache/strands/skills/(configurable viacache_dirparameter)Usage
Changes
_url_loader.py(new)skill.pySkill.from_url()classmethodagent_skills.py_resolve_skills()to detect URL strings; addedcache_dirparametertest_url_loader.py(new)test_skill.pySkill.from_url()test_agent_skills.pyAgentSkillsDesign decisions
gitwhich is universally available. Handles auth naturally (SSH keys, credential helpers, tokens).--depth 1minimizes bandwidth and disk for skill repos.sha256(url + ref)[:16]as cache directory name. Repeated loads are instant.subprocess,hashlib,shutil).Test plan
@hookdecorator typing)aws-data-agent-skill)obra/superpowers/brainstorming)